Generating C4.5 Production Rules in Parallel
نویسنده
چکیده
Induction systems that represent concepts in the form of production rules have proven to be useful in a variety of domains where both accuracy and comprehensibility of the resulting models are important. However, the computational requirements for inducing a set of rules from large, noisy training sets can be enormous, so that techniques for improving the performance of rule induction systems by exploiting parallelism are of considerable interest. Recent work to parallelize the C4.5 rule generator algorithm is described. After presenting an overview of the algorithm and the parallelization strategy employed, empirical results of the parallel implementation that demonstrate substantial speedup over serial execution are provided. Introduction
منابع مشابه
Scaling Up the Rule Generation of C4.5
C4.5 is the most well-known inductive learning algorithm. It can be used to build decision trees as well as production rules. Production rules are a very common formalism for representing and using knowledge in many real-world domains. C4.5 generates production rules from raw trees. It has been shown that the set of production rules is usually both simpler and more accurate than the decision tr...
متن کاملDetecting Symptoms of Low Performance Using Production Rules
E-Learning systems offer students innovative and attractive ways of learning through augmentation or substitution of traditional lectures and exercises with online learning material. Such material can be accessed at any time from anywhere using different devices, and can be personalized according to the individual student’s needs, goals and knowledge. However, authoring and evaluation of this m...
متن کاملGenerating A urate Rule Sets Without Global Optimization
The two dominant schemes for rule-learning, C4.5 and RIPPER, both operate in two stages. First they induce an initial rule set and then they refine it using a rather complex optimization stage that discards (C4.5) or adjusts (RIPPER) individual rules to make them work better together. In contrast, this paper shows how good rule sets can be learned one rule at a time, without any need for global...
متن کاملCPAR: Classification based on Predictive Association Rules
Recent studies in data mining have proposed a new classification approach, called associative classification, which, according to several reports, such as [7, 6], achieves higher classification accuracy than traditional classification approaches such as C4.5. However, the approach also suffers from two major deficiencies: (1) it generates a very large number of association rules, which leads to...
متن کاملData mining with a parallel rule induction system based on gene expression programming
A parallel rule induction system based on gene expression programming (GEP) is reported in this paper. The system was developed for data classification. The parallel processing environment was implemented on a cluster using a message-passing interface. A master-slave GEP was implemented according to the Michigan approach for representing a solution for a classification problem. A multiple maste...
متن کامل